allenkaci_92524_7721337_Internship Activity Plan Outline.pdf
  • Project Objective : Develop a system to be used for surveillance for multiple lines of business within Wells Fargo Securities Asset Backed Finance group .
  • The current system has each LOB completing surveillance on their own agenda .
  • If I wanted to look at one client who owns multiple facilities and aggregate their data , I could not do that in the current system .
  • The future system will provide interactive dashboards to have dynamic data that is more useful .
  • 1/13-1/27 : Compile necessary documents from each LOB and produce a project outline of goals ,
  • 3/23-4/6 : Finish any outstanding work , submit tool and interactive reporting dashboard ( s ) to upper
  • Outcomes Expected : A tool to be used by all LOB s to better streamline the quantity and quality of portfolio surveillance .
  • An enhanced dynamic and interactive reporting dashboard to better allow for easier portfolio surveillance from upper management .
  • A cohesive system that will provide consistent reporting from all LOB s that removes unnecessary information being presented .
  • Risks : Lack of participation by all LOB s , unable to gain necessary info due to resistant to change parties .
almasriosama_LATE_94996_7777553_Internship Activity Plan - Osama AlMasri.pdf
  • Identify trends centered around Talent Retention and Turnover including predicting the risk of exit by employees based on number of factors including age , gender , geographical region , type of employee ( Blue Collar vs. White Collar ) and type of job .
  • Leverage historical data of key characteristics of past resignations and recommend actions to prevent talent from leaving .
  • Identify the Issue : • Gather detailed requirements from stakeholders that will serve as specific use cases
  • Explore Alteryx as a potential data wrangling tool .
  • Build model ( s ) , interpret and visualize results as needed .
  • Consider non-HR data ( financial , competition , market indices ) to further interpret results as
  • Create a final presentation or dashboard to communicate findings .
  • Document current challenges that led to the need for the project .
  • Identify HR systems needed to answer questions .
  • Perform data cleaning including imputations if needed .
annamrajusaismaran_LATE_125662_7889962_Activity Plan.pdf
  • Project Title : Estimating Dynamic Ideal Points for State Supreme Courts
  • Working on small sample of data on single state to run the ideal points estimation algorithm .
  • Running the algorithm for fast ideal points on small samples of data individually on each
  • Validating the results of the Imai s Fast ideal points algorithm with the help of Bayesian
  • models of ideal points using the Markov chain Monte Carlo algorithm scores .
  • Validating the results of each individual state with the original source data and establish the
  • Analyzing the trends in decisions made by the Supreme courts across the year with help of
  • Creating dynamic trend charts for each individual state and select Supreme Court Judges to
  • Trends of political standing of each individual state across the years on various issues .
  • The potential risk would be the high computational power required for the algorithm to run simulations on the original data set .
bapatanjali_LATE_126445_7842639_Internship_Activity_plan.pdf
  • I would then be calculating the rating out of 100 using these predicted values which would then be viewed by pavement managers in the app .
  • Predicting Ride Quality using accelerometer data : The device also captures accelerometer data which could be used to predict the ride quality of the road .
  • Setup Python libraries including sqlalchemy , XG Boost , Jupyter Notebook , Heroku
  • Learn geospatial courses in Datacamp and read articles and blogs related to
  • Continue onboarding more cities and Extract Geospatial Data using OpenStreetMap
  • Match edited shapefile with client ’ s data and upload it into the Database .
  • Cleaning and mining of the device collected imagery Data to avoid redundant images
  • involving multiple drives and cul de sacs .
  • Analyze the final data and predict the severity of the different types of cracks using
  • We will try to run the various models for predicting the ride quality and severity of the cracks .
bartonronald_LATE_125819_7784590_Activity Plan RB.pdf
  • Ultimately this will be a tough decision as any increase in model performance must be weighed against the cost of acquiring the data .
  • A small improvement , while successful , may not be worth the cost of building a data pipeline for this new source .
  • Much of the project will center around wrangling data , like loading , cleaning , and imputing the third party dataset .
  • Some feature engineering will be performed using domain knowledge if needed to attempt to improve a models predictive power .
  • There will univariate analysis for the individual variables that the third party data source introduces .
  • We will work with business partners across the company to gain domain knowledge or receive guidance on best steps to move a project forward .
  • -2nd half Feb : build core model with most useful sq foot var
  • Compare Pitney bowes square foot vars to the core model
  • Outcomes Expected : want a full analysis of the property variable I have been assigned , and recommendations of if we should purchase this data .
  • Risks : The data does not provide any lift in the model and the core variables were better or performed equally .
cantyjeremiah_LATE_35429_7862622_Internship Activity Plan Outline.pdf
  • Project Objective : Create an algorithm using longitudinal data to predict the different run times of high school boys and girls 5k runners .
  • After formulating an algorithm use the algorithm to explain the data and to determine why athlete s dropout , what variables affect run times , and predict future run times across high school life cycle .
  • 1/27 – 2/10 : Collect , clean , and organize data to be used for algorithm in order , adjust expectations
  • of project with mentor , convert times into minutes , and create hypothesis .
  • Begin binning results of runners based on run time speeds in order to formulate results of overall athletes by run times .
  • Create an unsupervised cluster analysis on the data to determine which characteristics group certain individuals together automatically .
  • Discover how many students dropped out from cross-country throughout high-school and see if survivorship bias exists within the data and conclude about the initial hypothesis .
  • Create other helpful visualizations that coincide with data analysis in order to communicate a complete understanding of the variables that affect high school athletes run times .
  • If time permits add updated data and predict which runner will win the next 5k event over the next couple of years and compare to actual results .
  • Outcomes Expected : An accurate prediction model that displays the regression and progression of athletes ’ overtime and can explain the error rate for the predicted future run times of runners .
cardenasarturo_27332_7715287_Internship Activity Plan Outline-1.pdf
  • Village Hearts Beat ( VHB ) Project
  • Mentor/Company : Michael Dulin , MD , PhD | UNC Charlotte | College of Health & Human Services
  • The Academy for Population Health Innovation https : //www.aphinnovation.org/
  • Build a RedCap Study database and web entry forms .
  • Build interactive dashboards in R-Shiny for participants , coaches , and the program staff .
  • how close we are when the coaches collect the actual data .
  • Tasks Start and Initial set up Business Understanding , Requirements gathering Redcap Cloud Tutorials and Training RedCap Database and WebForm Preliminary design RedCap Prototypes , User testing , feedback and modifications RedCap Test Release RedCap Final Modifications , Data Collection RedCap Production Deployment RedCap Feedback R-shiny dashboards preliminary design R-shiny prototypes , user testing and modifications R-Shiny Test release , R-Shiny Production Deployment R-Shiny Feedback Data Science : Data Requirements gathering Data Science : Data Understanding and Preparation Data Science : Models proposal , Regression , Classification and others .
  • RedCap Web entry form should be easy to use and intuitive for the user RedCap database should have all the necessary fields for the 3 objectives of the project .
  • Data Science , the models should be able to predict as close as possible actual results .
  • Time : proposed schedule , might need to work overtime because deadlines User Acceptance , to make sure the end user likes what they see , and is comfortable to use .
chennupatisivateja_LATE_126528_7927595_Internship Activity Plan_schennu1.pdf
  • The recent research about the Sick Building Syndrome ( SBS ) has increased attention to the effect of the built environment on the occupant s health condition , especially at long term occupancy levels such as office environments .
  • The main objective of this project is the creation of an optimized algorithm based on the occupant ’ s bio-responses to provide both health improvement and energy saving .
  • To build models , we will collect data in an isolated environment that set up in our lab .
  • thermal and RGB cameras , Environmental sensors , and wristband to collect diverse data that required to build individual models .
  • Our first goal is to analyze the region of interests ( ROI 's ) in the images that capture using thermal and RGB cameras .
  • We create a dataset with the information that comes from the images captured using cameras , indoor conditions using sensors , skin temperature , and heart rate using a wristband .
  • Like the lab environment , it is hard to use a wristband to collect data where the location will have multiple occupants and people will move from one place to another .
  • related to a few external resources , project timelines might be changed based on the time taken for data collection .
  • Since my participation is limited in setting up the cameras and calibrating them , I can spend that time in building various models with the available data .
  • As per the timeline , we finished running the initial tests now and started collecting data in the lab environment .
coimbatorenatarajan_LATE_70048_8058403_Internship Activity Plan_NATARAJAN SHANTHI COIMBATORE-2.pdf
  • Name : NATARAJAN SHANTHI COIMBATORE
  • Date : 02-21-2020
  • Project Title : Optimized Dynamic Scheduling Engine
  • Mentor/Company : Jason Darwin/Spectrum Reach
  • Dates of Internship : 02/17/2020 to 05/04/2020
  • Project Objective : Build a Prototype of Optimized Dynamic Scheduling Engine logic
  • Methodology : Classification , Regression , Time Series analysis , Analytics
  • Major Tasks :
  • Outcomes Expected : Anomalies in data , Pattern recognition , Correlations with data elements , Prediction model for automated scheduling based on the business rules .
  • Risks : Volume of data and attributes , Gathering all the business rules used by the current Traffic team , getting the right data set .
copeblake_LATE_23876_7839751_Internship Activity Plan .pdf
  • During this time period my goal is to get acquainted with the Sports Atlas team , the organization , get started with my first task of web scraping attendance data for the five major sports leagues in America .
  • Perform initial exploratory data analysis on article flagging for references .
  • Next I will work with the team to develop initial framework for Natural Language Processing tagging for informative articles .
  • Next I will build a dataset from an API and create initial article tagging with string/n-gram searches .
  • At this point I will assess initial tag quality and optimize the model for peak performance .
  • Outcomes Expected : Successfully launch Sports Atlas as a well functioning data product to our vendors .
  • The outcome during this time period is to successfully scrap attendance data for the NFL , NBA , MLB , NHL , and MLS and implement them into our database .
  • At the end of this time period a framework developed for our NLP model as a dataset for initial article tagging .
  • Potential risk includes negative feedback from users of Sports Atlas and difficulty implementing our NLP model .
  • We are launching Sports Atlas in the middle of February and feedback from users could cause issues with the companies timeline .
demirelif_126087_7723160_edemir_Internship ActivityPlan.pdf
  • Innovation Partners , LLC ( IPL ) , is a diversified financial services and consulting firm .
  • IPL consults to a large number of local and multinational clients such as insurance companies , re-insurers , banks , fund managers , broker dealers , regulatory bodies , government entities and retail organizations .
  • The primary clients of the IPL are the over 600,000 registered investment advisors and securities brokers that are based in the USA .
  • We have an Accupoint dataset that includes all registered representatives and advisors in the USA .
  • Our data has an attribute of those representatives and advisors based in various cities and states through the country .
  • Accordingly , various public databases can also be used to ascertain additional attributes about this group of advisors and also find additional factors that could predict the ideal advisor who may be interested in working for our firm or who we should target in a recruiting campaign .
  • This will help to identify the outliers , distribution of the attributes , missing values .
  • • Unsupervised Learning ( Clustering ) : Our primary dataset has no target value such as previous marketing campaign and their responses .
  • After identifying the optimal numbers of clusters , we will discuss what is the most beneficial marketing campaign for each cluster/cluster member .
  • Jan 13 - Feb 7 : Understand the IP s business characteristics including , representatives , clients .
duttaroma_LATE_117177_7839259_Internship Activity Plan_rdutta4.pdf
  • Leverage advanced data science techniques , processes and skill sets to develop models and algorithms to provide customer analytics with focus on customer complaints and generate data-driven insights that could be implemented to improve customer experience .
  • Extend the analytic approach/ developed algorithm for portfolio management and growth initiative
  • Knowledge sharing session with mentor to understand data at the financial organization
  • Installation of necessary software such as DB Visualizer/MySQL Workbench/SQL Developer etc for database tasks
  • Research on recently applied techniques on existing data
  • Interact with subject matter expert ( s ) to better understand the data
  • Feature generation using DTM Classifying the complaints to achieve market segmentation Final decision on tools/packages/software to be used in designing the model Progress update Review
  • Create Visualizations Create necessary Documentation and upload the same in github/Bitbucket Final validation and Model tuning if required Final presentation
  • Gaining access to financial data Permissions to have read-only view of relevant datasets
  • Having access to necessary data source to fulfill the set task
fairgabriel_LATE_30981_7908154_Internship Activity Plan Outline.pdf
  • Project Objective : Gabriel will assess , and report on the correctness , completeness , availability , usability , integrity and security of all digital information that is received , sent , generated , moved , stored , or transformed at ATD in order to assist with the companies migration to an Master Data Management framework our team needs to implement .
  • Get introduced to systems and brainstorm on the future state architecture and management of
  • Meet team that manages company product offerings and document their frustrations and issues with
  • Get introduced to systems and brainstorm on the future state architecture and management of
  • Schedule meetings with vendors and evaluate their offerings considering ATD s Data Management
  • Using data science and visual analytics , bin customer deliveries from ATD s 120 distribution centers in
  • terms of cost to company and profitability of sale ( First draft of project )
  • Fix SKU descriptions of products provided by our vendors by scraping it from various online sites and
  • C-Level Employees faith in our team s ability to provide an effective culture of data governance 4 .
  • Documentation and understanding of each of ATD s important data stores : customer , transaction , invoices , products ,
gargdivya_LATE_116438_7922821_Internship Activity Plan Outline updated.pdf
  • Project Title : Predicting Road Crashes across States
  • Project Objective : ODN worked with Howard University , on behalf of the District Department of Transportation ( DDOT ) to construct a predictive model to anticipate which roads are most likely to have traffic crashes in Washington DC .
  • This work is in support of the city s broader Vision Zero initiative and its deliverables are valuable tools for city planners to prioritize traffic safety engineering , education , and enforcement activities to prevent crashes and save lives .
  • Preprocessing Data Feature Engineering Exclude Missing values Create Unique Identifiers Model Selection Variable selection Model Evaluation Results
  • Map Crash Data , generating summary statistics
  • Looking for ways to explore traffic management datasets .
  • Testing and Training models for annual daily traffic for every road in state Match the crash data to the road networks , where each crash happens .
  • Iterative model selection for categorical target variable
  • Iterating over the models created to improve accuracy
  • Outcomes Expected : Road crash prediction model for a state selected by mentor and team , detailed documentation .
griderhansen_LATE_95247_7954613_Internship Activity Plan - Grider.pdf
  • Formal complaints must then be summarized and assigned an appropriate FINRA code ( s ) so that they can be processed in a timely manner and allotted the appropriate response .
  • The current processes in place are highly reliant on manual review as well a rudimentary dictionary of keywords in an attempt to isolate specific problems .
  • The data science ( “ DS ) team has been engaged to help automate this process so that client complaints can be accurately addressed .
  • The complaints domain poses a multipart problem that the DS team believes can be mitigated through machine learning .
  • Finally , we will need to use unsupervised clustering to try to identify possible data related issues that result in customer complaints .
  • exercise given to all new hires on the data science team to acclimate them to the standards of style and polish expected in final deliverables .
  • This project has multiple deliverables that generally accompany most initiatives from the data science team at TIAA .
  • This project assumes that we can acquire 2 years ’ worth of data from TIAA s Siebel and DRC databases .
  • We will also have to rely on SMEs to help provide domain knowledge on business root causes ( of complaints ) and for understanding their process of making determinations ( How do they sample ?
  • In addition to all of these assumptions that will have to hold , and assuming that we will get appropriate participation from other required SMEs , there is still some risk that we will not be able to solve the problems within the allotted time .
gulleyalexander_LATE_117139_7794321_DSBA 6400 Activity Plan-1.pdf
  • Predict case volumes Identify if there are any patterns between general business ( e.g. transaction count / avg transaction amount ) and case volume
  • This forecast will be used in long term planning ( next quarter to the next calendar year ) and budget requests .
  • Because this is also my normal job and I have duties outside of this internship , this schedule is subject to be adjusted by performing some of these tasks within the same week and having weeks without any progress made .
  • Part of the discussion with stakeholders during week 1 will be on target accuracy .
  • The primary method chosen will be time series forecasting .
  • I may attempt to do some sort of linear regression but I will likely stick to time series forecasting as the case volume should continue to grow well past the historical bounds .
  • I expect to use SAS for this since my company restricts our ability to download packages for open source software .
  • - - Overstaffing could create larger than expected overhead and increase personnel costs
  • This forecast will be use in capacity planning for our investigations team for staffing .
  • A projection for the next year will also be provided for longer term planning .
guptasmits_LATE_116219_8104157_Internship Activity Plan_updated-1.pdf
  • The parameters used in this scorecard are accessibility , storage format , relevance and sufficiency , integration , data quality , collection frequency , granularity etc .
  • Since I would be working with their data and also educating them ( stakeholders ) with the open source technical capabilities , it is crucial to establish a form of trust .
  • I will get access to the Mecklenburg Quality of Life dataset which I will clean and perform some Exploratory Data Analysis .
  • Then I will use a combination of machine learning algorithm to understand the factors responsible for enhancing the Quality of Life in the Mecklenburg county .
  • I will also be responsible for standardization the data in the platform , build the machine learning models and provide some analysis as a part of my job objective .
  • I have got access to the Mecklenburg Quality of Life dataset and I am looking at the data dictionary to understand the different variables .
  • I have access to the Mecklenburg Quality of dataset which is a huge dataset with information about education , economy , culture , health and wellness and I am planning to do some data cleaning , exploratory data analysis and apply machine learning algorithms to identify the factors responsible for enhancing the life of the citizens of the Mecklenburg county .
  • In the next week , I will work on the data wrangling part to understand the important variables and missing values and how to impute them .
  • In the 4th week , I will run a machine learning algorithm to understand the factors responsible for enhancing the quality of life in Mecklenburg county .
  • I am hoping that I will be able to deal with the missing values and then perform machine learning algorithms .
hakasmaggie_LATE_127066_7947164_Internship_Hakas-1.pdf
  • Project Objective : The objective of this project is to work in sprints ( ~2 week blocks ) to establish if using a specific third party data source is beneficial to predicting claims , particularly if it is more beneficial than models currently in production .
  • optimal results and ideas , and working with the data science team as a whole .
  • party dataset , find those with high cardinality , and figure out the best method to try and consolidate the variables .
  • I will be looking at univariates by each claim type , variable importance tables from initial modelings , fill rates and data types , and consulting the data dictionary for how the third party defines levels within these variables .
  • I will then take my suggestions to experts on the home insurance team to see which idea is the best to utilize for each variable .
  • As stated before , the goal of this analysis is to aid in the decision to either purchase their product or not .
  • Due to previous work done , it is clear that there are some interactions going on in the data that are worth looking into to see if there is lift .
  • There could be patterns , possibly by state or county , or even correlations with other missing data that could actually be very important to our claims predictions .
  • Outcomes Expected : Edits made to the loss model so that lift is achieved and that it would be worth pushing into production for claims related predictions .
  • However , I will add that any time spent on a model , even if it doesn ’ t work , would still be very useful to the organization in knowing that a different direction should be taken .
kishorekumarsudha_LATE_95533_7762293_Sudha KishoreKumar_Internship Activity Plan.pdf
  • Membership reports are now hosted in MS Access Database and is not accessible to others in the company rather than a static report posted in the SharePoint site .
  • As part of the internship , process involved in creating the table for membership and other reports will be coded in SAS .
  • This will be scheduled to run after Datawarehouse load every month to eliminate the manual intervention involved in populating the table .
  • NCQA ( National Committee for Quality Assurance ) requires membership data to be submitted prior to populating the quality measures in summer every year .
  • Membership and accreditation reports are generated every month using MS Access and posted in the Sharepoint site .
  • 02/03/2020 – 02/07/2020 : Understand the MS access queries to create tables and reports for
  • Analyze Automate the process involved in creating the membership table
  • Developing Technical Documentation System/ User Acceptance testing of NCQA Membership SAS reports
  • I have worked in reporting/ dashboarding tools like Tableau , Cognos , Crystal reports .
  • So assuming it should be easy to get a grasp of this new tool and develop dasbhboard .
laixinxin_121407_7719241_Xinxin_Lai_Activity Plan_V2.pdf
  • Optimize Inventory: A model could be built to predict the volumes of major products on a weekly or monthly basis , especially those items that are under- or over- stock according to historical data .
  • This model could be run every quarter to identify those customers complain the most and contact their sales representatives to take further actions to remedy the situations .
  • By identifying customers likely to complain before they do , the company could take precautious action such as double check their orders or send sales representative to meet with them and save significant amount of time and effort .
  • use SAS Enterprise Guide to clean up the messy data , join them and build a single big csv file for the purpose of friendly analytics .
  • 2/20/2020 – 3/05/2020 : I will need to present what I have with the clean data , and the ABTs I will build with the domain experts and manager to see whether they have any suggestions and I will begin from there to take next step .
  • They confront with typical and tough issue in the food distribution business which is over- or under- stock which could make them suffer from losses daily .
  • I have had meeting with the procurement manager and his assistant to help me to understand the data I am dealing with and conduct some cleaning by replacing the mean of some of the missing values and the outliers .
  • But it needs more confirmation with the procurement manager that the numbers I filled in are making sense in this industry during that specific of time .
  • -If the outliers such as some huge amount of supplies during a specific time I replaced are just some real but unexpected data , then it would be biased if I threw away those .
  • -Similar to outliers , missing values such as a lot of fresh produce may be short of supply during some circumstances such as extreme weather .
muchajulian_1892_7723451_Internship Activity Plan.pdf
  • This includes understanding how current scheduling is done by managers , addressing any factors that may influence decision making .
  • Methodology : To achieve the aforementioned tasks , there are a variety of methods as well as stages that must be completed to ensure the target outcome .
  • The first step is to gather up the relevant existing data sources to select the cost centers for building the model .
  • Visualizations as well as descriptive statistics will be generated to aid in the selection of the cost centers as well as testing the veracity of the data .
  • A decent portion of time will be spent on gathering not yet existing data from the cost centers themselves to determine the specific tasks that a position may be expected to perform .
  • Based on the data we receive and the discussions had within the team , a more condensed list of models to be evaluated will be devised .
  • Another potential risk to the internship project is that the model may not be reasonably scaled up for use in the broader company sphere .
  • To avoid this risk , necessary caution and effort must be dedicated to selecting representative cost centers for the population and identifying outliers or potential issues with data through thorough cleaning and analysis .
  • April ( to be discussed further ) -The model will be tested on the selected cost centers to evaluate effectiveness ( 2-3 weeks ) .
  • The effectiveness of the model will then be discussed with the team for potential generalized deployment or next steps ( 1 week ) .
paulkabita_LATE_126534_7857116_Activity Plan-1.pdf
  • Information is also available in different formats like activity tracker data stream , family history and genealogy , Electronic Medical Records ( EMR ) etc .
  • This project broadly aims to build a tool that is envisioned as a browser based dashboard tool , along with smartphone app and personalized digital assistance that allows the patient : ( 1 ) to annotate data for themselves ( particularly important for lab values like glucose or for physical activity ) ; ( 2 ) create calendars and to-do lists , and reminder functions that are linked so that any entry allows for auto-population within the other sections ; ( 3 ) generate notifications whenever new data from the electronic health record or other sources are pushed into their account ; ( 4 ) allow to feed photos of medications taken via smartphone etc .
  • Keeping patients healthy and avoiding worsening of disease stands at the front of priority list in healthcare industry .
  • By collecting and analysing aggregated data , we can identify key attributes that are most important to store and manage for certain medical conditions .
  • By taking the advantage of Big data technologies collaborating with machine learning , recommendations can be made to patient with medical conditions or to people who want to achieve healthier lifestyle .
  • ML Modelling- By using predictive modelling and feature importance analysis , we are planning to identify key attributes .
  • The identified features/ variables are of utmost importance to design/ build the prototype tool as it will help to prioritize and organize the dashboard .
  • Task 2 : Reading scholarly articles on related area/ topic to find out what works are already done and what are the areas less focussed on .
  • Task 4 : Identifying key attributes to be managed for specific health conditions using Machine Learning Models .
  • This project of building customized interactive tool is multi-disciplinary and collaborates with departments like Computer Science , Psychology , Healthcare and Communication .
penmathsamanideepvarma_LATE_127089_8026910_Internship Activity Plan.pdf
  • The business needs an effective way to find the anomalies in data without spending too much time on excel sheet analysis .
  • Tableau dashboards are a great way to show these anomalies that is easy for upper management to understand .
  • These risks are expected to be completely avoided and the confidential financial data should be handled very carefully
  • This SAS model uses a linear optimizer to find the minimum cost and maximum profit from selling High Quality Liquid Assets ( HQLAs ) for the bank .
  • Objective/ Expected Outcomes : Build a python module that will be able to extract the output and load it to the oracle database as relational table .
  • Write the function in python language and add it as an additional feature to the existing code .
  • Understanding the existing SAS and Python models that utilize linear optimization for stress scenario forecasting .
  • Analyze the output data from Python model , identify key metrics , and write views using SQL Statements .
  • Objective/ Expected Outcomes : From the output that is loaded into Oracle database table , identify the key metrics such as overall Sales , Repos , Runoffs from the Python model and create views with aggregates to simply and visualize for end users ( In this case , higher level management ) .
  • Present the tableau dashboards to the team and my manager while in UAT environment , get feedback , and make changes/refine as needed .
richterlyndsay_4676_7722877_Internship Activity Plan - LRichter.pdf
  • The internship remains supervised by the UNC Charlotte Student Affairs Research and Assessment ( SARA ) office .
  • Upon completion of the SASS project , the intern and SARA will evaluate the best direction for the second half of the internship .
  • Also under consideration is a visual analysis of UNC Charlotte data provided by National Survey of Student Engagement ( NSSE ) .
  • The SASS office supports students experiencing a broad range of issues , concerns or challenges interfering with an individual s ability to be successful academically or personally .
  • In addition , the SASS office administers the Withdrawal for Extenuating Circumstances ( WE ) process , which is the focus of the first project .
  • Reasons that do not qualify for WE include poor academic performance , financial hardship , or school/work/life balance .
  • The goal of the WE process is to help best position the student to resume their academic career by minimizing the adverse impact of an term interrupted due to unforeseen circumstances .
  • Meet and review progress and visualizations with departmental liaisons • Plan 2-3 dashboards addressing specific department objectives ( for example , an overview of the
  • numbers of cases , time-series analysis , a deeper dive into major issues that contribute to WEs )
  • Research best practices based on selected datasets • Build preliminary series of dashboards ( 3-5+ )
romanochase_127048_7721088_Internship Activity Plan Outline.pdf
  • I have been doing some side work with the Advanced Analytics team that is under Taavo Raykoff since August of 2019 that I am converting to as my internship .
  • Using the skills I have learned in my master ’ s program I have already built a predictive model for subscriber activity and plan to implement that for Q1 through Micro strategy .
  • Our department just integrated R and I am going to help guide Taavo s Advanced Analytics team through development of multiple R shiny apps that will be used to display important data to the GVP and business users .
  • The opportunity in viewership is the metrics are tracked across a multitude of markets , and there are frequent anomalous events ( games , elections , big news ) which are not issues .
  • This would likely be some sort of PCA to create a series of errors values that would then be fed into anomaly detection .
  • In Q1 , we are building three tools on a prototype basis that would benefit from a highly stylized , interactive user interface :
  • For each of these three , understand the underlying methodology , and built highly -engaging and interactive user interfaces in R Shiny .
  • These would take advantage of geospatial and traditional visualizations in addition to user-interface elements ( sliders , selection boxes , etc . )
  • Risks : This is new to our department and we are still working through setting up servers and getting R and Shiny to operate smoothly .
  • If these projects do not go well our GVP may see it as a waste and cut funding on behalf of Advanced Analytics/ Data science work .
sadikovibrokhim_LATE_125664_8004535_Internship Activity Plan Outline-Ibrokhim Sadikov-1.pdf
  • Project Title : Advanced Data Analytics
  • Project Objective : Develop end to end Data Analytics PaaS using robust data technologies to leverage client ` s Investment Portfolio entities like Macro Investment Portfolio and Credit Investment portfolio so that effectively communicate real time data analytics to execute trades and strategically improve decision making .
  • Methodology : We will be using wide variety of methodologies depending each product phase development from Data Quality modelling to real time machine learning modells
  • Major Tasks : ( in major chunks that may be ~ 2 weeks of work and takes you to end of internship )
  • 2/24/2020 – 2/25/2020 : Perform research and development tasks on Talend ( an ETL tool )
  • 2/26/2020 – 03/2/2020 : Running POS on Checking Data Quality modelling on Talend and perform Spark
  • 3/3/2020 – 3/6/2020 : Develop High level end to end data analytics pipeline based on a new architecture
  • 3/4/2020 – 5/7/2020 : Create a prototype stimulation of data extraction , preparation , and analytics for a
  • Outcomes Expected : Basically my essential duty is to conduct R & D for advanced analytics , and testing different scenarios on new technologies and methodologies that can be used to optimize operation and revenue generation for client usually called as POS .
  •  A lot of new technologies that are needed to be practiced Heavily involving on Data Engineering modelling  Time and Governance constraints that may hinder on time submission of Deliverables
serapinzach_LATE_29510_7944485_Internship Activity Plan Outline - zserapin resubmission.pdf
  • In conjunction with the network science class I took in the fall , I need to understand the data I will be using , relevant fields , attributes and how I can manipulate it to answer specific questions .
  • I will primarily use igraph in R , but will also need to be familiar with python s networkX in case that offers specific packages that make it easier to complete one of my outlined tasks .
  • My analysis will have to conclude by creating a slide or two summarizing key findings to business partners across functions , many of whom with non-technical roles and backgrounds .
  • Building off of this subject , I then will look to conduct similar analysis on the possible impact a negative rate scenario would have on the model network .
  • First from an understanding purpose , it is important that we have a complete grasp on how the particular centrality score is calculated and ensure its accuracy before embarking on further analysis .
  • Next , if business partners where to challenge us on the methodology , it is important that our due diligence is complete to provide validity to the work we are doing .
  • Generating all-paths from the network will also require the use of batch computing as the algorithm has proven to be unable to run on a simple desktop .
  • The completion of said tasks should lead to a quality internship experience that balances the need for both skill development and business value add .
  • The book than can be shared to senior leadership in hopes that I understand the work I do and look to take action to address systems risk .
  • While this would represent valuable and realistic experience , it may distract me from the project as whole , limiting the ability to comprehensively assess systems risk .
shuklabalya_LATE_1256_8042861_Internship Activity Plan Outline - Balya Shukla-1.pdf
  • Project Title : Data Solutions with Talend
  • Project Objective : Assist the client in the development of a program for internal trading use by identifying important datasets and tools and using analytics to improve the overall program portfolio .
  • 2/24/2020 – 2/25/2020 : Perform research and development tasks on Talend ( an ETL tool )
  • 2/26/2020 – 03/2/2020 : Create a project plan using agile methodology
  • 3/4/2020 – 5/7/2020 : Create a prototype stimulation of data extraction , preparation , and analytics for a bigger program .
  • Outcomes Expected : The expected outcome is to optimally use data engineering and data analytics in the research and development step of the program creation .
  • I will be creating analytics pipelines in Talend using market/stocks dataset as part of the research .
  • This will help the client gauge the capacity of Talend and its potential of supporting the final trading program .
  • Time constraint : the process of licensing of the software can take time which might extend the project timeline .
  • - Resource constraint : Tools we will be using are not open source which can limit the maximum
singaravelmurali_110276_7718234_Internship Activity Plan msingara.pdf
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Activity Plan
  • Project Title : Customer Complaint Analysis Using Machine Learning
  • Project Objective : The objective of the project is to analyze customer complaint data and answers questions to leadership by identifying key and emerging trends , volumes , themes and insights which will help in root cause analytics development .
  • Present information using graphical representations with data visualization , use appropriate plots to
  • show the distribution for categorical and continuous features using Python or Tableau .
  • The classifiers will be scored on various performance measures such as like accuracy , F1-score , precision and recall .
  • Muralidharan Singaravel Student Id - 801 059 720 Spring 2020 DSBA Internship Activity Plan
  • Final presentation for Project Sponsors which contains high-level takeaways for executive level stakeholders , with a few key messages to aid their decision- making process
  • The reports will also provide visibility for Senior Leaders and teams into volume trends , identify process , policy and customer interaction breakdown in an effort to develop corrective actions to promote stronger customer and team member experiences .
  •  Not all the Python libraries are in the approved list of bank technology team , which may hinder
summeykelsey_1592_7722710_Internship Activity Plan Outline.pdf
  • They only track reported revenue from the initial contract , which means the value of bringing in a large commercial or industrial customer is not realized .
  • Additionally , because of this , strategic efforts for targeting high value customers is not being utilized to maximize ROI .
  • Methodology : SQL ( SSMS ) , Python , Excel , Salesforce , Power BI
  • It is important to note that due to the volume and the way the Economic Development team reports
  • Create a new database and tables in SSMS to store data unique to our project .
  • Science team and Vice President to check in with revenue calculation and query in SQL .
  • Outcomes Expected : My expected outcome as an intern is to find and track ACTUAL revenue in every jurisdiction ; create visualizations in Power BI to display to our entire Enterprise Team and various other
  • stakeholders , including our CFO ; introduce predictive analytics to the team by using it to aid in the creation of targeted sales & marketing plans that are based on industry forecasted growth and inter- jurisdictional load growth ; create a profit and loss statement ; and present my findings to the team .
  • Risks : Due to the complexity of the task and our billing system , our numbers will never be 100 % accurate .
  • There are various factors that go into ensuring a project and account number match up , and most of the time , the truest answer remains ambiguous .
tomasikmarie_126529_7621563_Internship_Activity_Plan.pdf
  • Marie Tomasik 800A Beaty St. Davidson , NC 28036 mtomasik @ uncc.edu
  • Professor Hauge 9201 University City Blvd 215 Bioinformatics Charlotte , NC 28223
  • The goal of this internship project will be to build a predictive model for unionization within Ingersoll Rand s manufacturing plants .
  • R will be the primary tool used to build the models , and the data will be pulled using Visier , Glint , and an API through R to access data published by the Bureau of Labor Statistics .
  • training for the internship includes learning the data storage and visualization tools that Ingersoll Rand uses .
  • The next two weeks will be spent developing hypotheses and exploring data sources .
  • develop hypotheses about what causes plants to unionize we will be meeting with the project stakeholders .
  • Specifically when we look to factors around how employees feel about their jobs , workplace , managers , and coworkers we will need to define what specific details from the engagement survey will be put into the model to test our hypotheses .
  • I will present the exploratory analysis results to the stakeholders around this time and possibly make edits to the top priority hypotheses .
  • I will build the preliminary model in R that will include our top priority hypotheses .
vavilalasrivan_LATE_19848_7741393_Srivan Vavilala - Internship Activity Plan.pdf
  • In the week of February 10th , there will be an in-depth check-in between my mentor and I to ensure that the projects are on track to meet all the deliverable requirements .
  • The objectives range from basic data analysis to the application of NLP principles in a classification model .
  • First , the goal is to slowly migrate their business analytics from a tool Firebase to a newer one called MixPanel .
  • The main reason for the migration is because MixPanel is an industry standard and the company would like to explore the robustness of this analytical tool in order to see if they can derive new business insights that they could n't on FireBase .
  • Therefore , I believe that there is a potential application of certain NLP principles that could help in the automatic categorization of the products being pulled in from the various data sources .
  • Once they are successfully recreated , I plan on using my newfound knowledge with this analytical tool in order to derive new business insights which I will report through various visualizations and other metrics .
  • However , I plan on exploring my options in R as well in order to ensure that all my bases are covered , and that I do n't miss out on potentially useful solutions to the problem .
  • The main goal will be to create an algorithm that can distinguish and retag products coming in from external databases .
  • If possible , the newly categorized database will be imported into MixPanel and another round of data analysis will be run on it as before .
  • Consequently , the company may potentially lose revenue from the sale of the product due to this error .
vegesnakovidh_LATE_31534_7835835_Internship Activity Plan - kvegesna.pdf
  • The project I will be working on is with Dr. Colby Ford ( mentor ) , Dr. Eugenia Lo and her team from the biology department at UNC Charlotte .
  • Some of Dr. Lo s Ph.D./Master s students ( Cambel and Kareen ) have collected data related to people who have malaria .
  • The data they have has been generated from DNA sequencing kits and other methods they have used to identify the specific genes they are interested in studying .
  • Other researches , at a different university , have used a different kit to generate identical ends for the DNA strands .
  • Lo and her students are using splits the DNA at random points , it makes it a little tricky to analyze the results that are generated from using the kit .
  • A haplotype is a group of alleles in an organism that are usually inherited together from a single parent or source .
  • Additionally , we want to see if we can use any machine learning techniques to find patterns within the dataset and possibly predict what type of clone a person with malaria is likely to have based on information about the patient .
  • Based on the cleaned dataset , we will try to see whether or not we can create visualizations to better understand what is happening with the data and assist Dr .
  • The work done during each chuck is subject to change depending on the progress that has been made previously , the technical issues that may arise , holidays , and other factors .
  • Try and build possible machine learning solutions to classify or predict what type of
xiachunqiu_LATE_116382_7772347_Internship Activity Plan_Chunqiu Xia-1.pdf
  • Mentor/Company : University of North Carolina-Charlotte , Belk College , Ming Chen
  • Apply statistical models using empirical data to analyze marketing research questions and explore the insights on the consumers ' behavior on social media .
  • The project is to exploration of consumers ' behaviors when they use social media such as Yelp .
  • The specific project deliverables are to help Professor Chen to conduct data preliminary analysis and model estimation .
  • The timeline of this project started from early January 2020 and will last to the end of spring semester 2020 .
  • Course work is based on the marketing analytics and statistical model estimation .
  • Second , try different models to assess the effect of customers ’ behaviors in social media .
  • Take a look at Time-Varying effectiveness model research papers , and find how it could be applied in Yelp dataset .
  • In addition , we are going to find how well these model can help restaurant save money from resource allocation .
  • Since the effectiveness of market is time-varying , I don ’ t think we can get a high accuracy model to predict the sales .
xiaodiwen_LATE_117612_7774320_Internship Activity Plan_Updated.pdf
  • Project Title : How to Enhance Online Hotel AD Effectiveness Based on Real-World Data : Mobile Eye- Tracking and Machine Learning Tell
  • The goal of this project is to explore of consumers ’ attention and their subsequent purchase behaviors when they are shopping online using mobile eye tracking and machine learning tell , which especially is to explores the analysis of differences of consumer s eye fixation counts with respect to different Areas of Interest of online hotel booking advertisement and then to provide suggestions in order to enhancing advertising effectiveness .
  • The specific project deliverables are to help Professor Chen to conduct several literature reviews , data preliminary analysis , model estimation and writing reports .
  • For this project study , it uses the powerful computing methodology machine learning to investigate consumers Areas of Interests for online hotel booking website with a mobile eye-tracking device .
  • 02/03/2020 - 02/14/2020 : In this time period , we are going to start the research project by conducting the first part of literature reviews , which is to get familiar with Google scholar and read related
  • 02/17/2020 – 02/28/2020 : In these two weeks , we are going to start the second part of the literature reviews that is to write reference summaries for each research study .
  • 05/04/2020 – 05/10/2020 : In these two weeks , I am going to write the overall reports and to explain how did I explore and develop the project objectives , tasks , and outcomes for the internship .
  • The final report may include executive summary , introduction , approach , findings , recommendations , and appendices as well .
  • The outcomes are to conduct several literature reviews in the machine learning and marketing research area that related with mobile eye tracking method , to apply one the most advanced techniques that is YOLO ( You Only Look Once ) to analyze the empirical data that was collected by eye tracking device , and finally provide some effective suggestions to help evaluate the effectiveness of online hotel advertisements layout on the booking rate .
  • For the reason that is very hard to set up the size and cost of the equipment in order to conducting this experiment , and the act of placing the eye-tracking device on the participant s head may give them a feeling of their actions being controlled , and it might limit the spontaneity of their behavior in the test